GitHub says Copilot improves code quality – but are AI coding tools actually producing results for developers?
Software development has frequently been identified as an area ripe for improvement through generative AI adoption, but a recent study has challenged perceptions around how beneficial AI coding tools are for developers.
The data science team at software development specialists Uplevel looked into the impact generative AI coding assistants are having on the efficiency and efficacy of developers.
The investigation found their performance has largely remained unchanged regardless of whether they are using AI tools or not.
Uplevel’s findings counter claims made by GitHub and other industry stakeholders, which have stated generative AI increases code quality and developer efficiency.
Uplevel Data Labs analyzed the performance of 800 developers among its customers, examining the difference in performance between teams with and without access to GitHub Copilot, one of the most popular AI coding assistants.
The research looked at metrics including cycle time, pull request (PR) throughput, bug rate, and extended working hours (also referred to as ‘always on’ time).
Uplevel found that, overall, Copilot provided no significant change in efficiency metrics.
Comparing the throughput, cycle time, and complexity of pull requests, including PRs with associated tests, the researchers found Copilot did not meaningfully help or hinder developers and had no effect on coding speed.
There was some change in certain metrics, but the actual changes recorded were deemed ‘inconsequential’. For example, Uplevel recorded a 1.7 minute decrease in average cycle time for developers with access to GitHub Copilot.
This clashes with findings from GitHub, which indicated that in the two years since GitHub Copilot was released to the public, developers were coding 55% faster.
Copilot-assisted code found to contain more errors
The developer platform recently published the results of another study of 202 developers, which found those with access to Copilot had increased functionality, readability, and overall code quality.
For example, GitHub found readability of code written with the assistance of Copilot improved by 3.62%, with similar increases in reliability (2.94%), maintainability (2.47%), and conciseness (4.16%).
GitHub said these numbers were statistically significant, but the findings from Uplevel’s investigation revealed more concerning effects of AI coding tools on code quality.
The firm found that while throughput stayed at around the same level code quality actually decreased, with significantly higher bug rates.
The bug rate in code generated by those using Copilot was found to have increased by 41%, which when taken alongside the fact that throughput was unchanged suggested that Copilot access may be hurting code quality, according to Uplevel.
Not only did Uplevel’s research indicate generative AI coding tools may not be helping improve developer’s efficiency or efficacy, but the data showed Copilot access was also not particularly effective in mitigating the risk of developer burnout.
Uplevel’s ‘Sustained Always On’ metric, which measures extended working time outside of standard hours – a primary indicator of burnout, was found to have decreased for both groups.
For those with Copilot access, this decrease was 17%, whereas it was measured at 28% for those not using the coding assistant.
Research indicates the appetite for AI coding tools among developers and business leaders has grown significantly in the last year or so, but the jury appears to still be out on whether these solutions are producing tangible value in the software development industry at the moment.
Source link